Goto

Collaborating Authors

 accuracy requirement



Energy-Efficient Autonomous Driving with Adaptive Perception and Robust Decision

arXiv.org Artificial Intelligence

Abstract--Autonomous driving is an emerging technology that is expected to bring significant social, economic, and environmental benefits. However, these benefits come with rising energy consumption by computation engines, limiting the driving range of vehicles, especially electric ones. Perception computing is typically the most power-intensive component, as it relies on large-scale deep learning models to extract environmental features. Recently, numerous studies have employed model compression techniques, such as sparsification, quantization, and distillation, to reduce computational consumption. However, these methods often result in either a substantial model size or a significant drop in perception accuracy compared to high-computation models. T o address these challenges, we propose an energy-efficient autonomous driving framework, called EneAD, which includes an adaptive perception and a robust decision module. In the adaptive perception module, a perception optimization strategy is designed from the perspective of data management and tuning. Firstly, we manage multiple perception models with different computational consumption and adjust the execution framerate dynamically. Then, we define them as knobs and design a transferable tuning method based on Bayesian optimization to identify promising knob values that achieve low computation while maintaining desired accuracy. T o adaptively switch the knob values in various traffic scenarios, a lightweight classification model is proposed to distinguish the perception difficulty in different scenarios. In the robust decision module, we propose a decision model based on reinforcement learning and design a regularization term to enhance driving stability in the face of perturbed perception results. EneAD can reduce perception consumption by 1.9 to 3.5 and thus improve driving range by 3.9% to 8.5%. Autonomous driving has gained broad attention from the public during the last few years [1], [2]. With intelligence, the autonomous vehicle can have a more comprehensive perception of the surrounding traffic environment and make more reasonable driving decisions compared to human drivers. As a result, it is expected to bring society a large number of benefits, including improved mobility and a significant reduction in collisions. For example, the computing platform using the Nvidia AGX Orin SoC [4] has a Thermal Design Power (TDP) of 800W . These power demands can also increase the thermal demands on a vehicle's climate-control system.


QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference

arXiv.org Artificial Intelligence

As machine learning inferences increasingly move to edge devices, adapting to diverse computational capabilities, hardware, and memory constraints becomes more critical. Instead of relying on a pre-trained model fixed for all future inference queries across diverse edge devices, we argue that planning an inference pattern with a request-specific model tailored to the device's computational capacity, accuracy requirements, and time constraints is more cost-efficient and robust to diverse scenarios. To this end, we propose an accuracy-aware and workload-balanced inference system that integrates joint model quantization and inference partitioning. In this approach, the server dynamically responds to inference queries by sending a quantized model and adaptively sharing the inference workload with the device. Meanwhile, the device's computational power, channel capacity, and accuracy requirements are considered when deciding. Furthermore, we introduce a new optimization framework for the inference system, incorporating joint model quantization and partitioning. Our approach optimizes layer-wise quantization bit width and partition points to minimize time consumption and cost while accounting for varying accuracy requirements of tasks through an accuracy degradation metric in our optimization model. To our knowledge, this work represents the first exploration of optimizing quantization layer-wise bit-width in the inference serving system, by introducing theoretical measurement of accuracy degradation. Simulation results demonstrate a substantial reduction in overall time and power consumption, with computation payloads decreasing by over 80% and accuracy degradation kept below 1%.


AI-Assisted Decision-Making for Clinical Assessment of Auto-Segmented Contour Quality

arXiv.org Artificial Intelligence

Purpose: This study introduces a novel Deep Learning (DL) - based q uality a sses s ment (QA) approach specifically designed for evaluating auto - generated contours (auto - contour s) in auto - segmentation for radiotherapy, with a focus on Online Adaptive Radiotherapy (OART). The proposed method leverages Bayesian Ordinal Classification (BOC), combined with cali brated thresholds derived from uncertainty quantification, to deliver confident QA predictions . This approach address es key challenges in clinical auto - segmentation QA workflows such as the absence of ground truth contours, limited availability of manually labeled data, and inherent uncertainty in AI model predictions . Methods: We developed a BOC model to classify the quality of auto - contour s and quantify uncertainty. To enhance predictive reliability, we implemented a calibration step to determine optimal uncertainty thresholds that meet specific clinical accuracy requirements . The method was validated under three distinct data availability scenarios: absence of manual labels, limited manual labeling, and extensive manual labeling. We specifically tested our method for auto - segmented rectum contours in prostate cancer radiotherapy. Geometric surrogate labels were employed in the absence of manual labels, transfer learning was applied when manual labels were limited, and direct use of manual labels was perf ormed when extensive labeling was available. Results: The BOC model demonstrated robust performance across all data availability scenarios for confident predictions, with significant accuracy gains when pre - trained with surrogate labels and fine - tuned with limited manual ly label ed data . Specifically, fine - tuning the pretrained model with just 30 manually labeled cases and calibrating with 34 subjects achieved over an accuracy of over 90% against manual labels in the test dataset .


Accuracy First: Selecting a Differential Privacy Level for Accuracy Constrained ERM

Neural Information Processing Systems

Traditional approaches to differential privacy assume a fixed privacy requirement ฮต for a computation, and attempt to maximize the accuracy of the computation subject to the privacy constraint. As differential privacy is increasingly deployed in practical settings, it may often be that there is instead a fixed accuracy requirement for a given computation and the data analyst would like to maximize the privacy of the computation subject to the accuracy constraint. This raises the question of how to find and run a maximally private empirical risk minimizer subject to a given accuracy requirement. We propose a general "noise reduction" framework that can apply to a variety of private empirical risk minimization (ERM) algorithms, using them to "search" the space of privacy levels to find the empirically strongest one that meets the accuracy constraint, and incurring only logarithmic overhead in the number of privacy levels searched. The privacy analysis of our algorithm leads naturally to a version of differential privacy where the privacy parameters are dependent on the data, which we term ex-post privacy, and which is related to the recently introduced notion of privacy odometers. We also give an ex-post privacy analysis of the classical AboveThreshold privacy tool, modifying it to allow for queries chosen depending on the database. Finally, we apply our approach to two common objective functions, regularized linear and logistic regression, and empirically compare our noise reduction methods to (i) inverting the theoretical utility guarantees of standard private ERM algorithms and (ii) a stronger, empirical baseline based on binary search.


HawkVision: Low-Latency Modeless Edge AI Serving

arXiv.org Artificial Intelligence

The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements. Previous work mostly focuses on modeless inference in data centers. To provide low-latency inference, in this paper, we promote modeless inference at the edge. The edge environment introduces additional challenges related to low power consumption, limited device memory, and volatile network environments. To address these challenges, we propose HawkVision, which provides low-latency modeless serving of vision DNNs. HawkVision leverages a two-layer edge-DC architecture that employs confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. It also supports lossy inference under volatile network environments. Our experimental results show that HawkVision outperforms current serving systems by up to 1.6X in P99 latency for providing modeless service. Our FPGA prototype demonstrates similar performance at certain accuracy levels with up to a 3.34X reduction in power consumption.


MOSEL: Inference Serving Using Dynamic Modality Selection

arXiv.org Artificial Intelligence

Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities. However, to attain the desired accuracy, the model sizes and in turn their computational requirements have increased drastically. Thus, serving predictions from these models to meet any target latency and cost requirements of applications remains a key challenge, despite recent work in building inference-serving systems as well as algorithmic approaches that dynamically adapt models based on inputs. In this paper, we introduce a form of dynamism, modality selection, where we adaptively choose modalities from inference inputs while maintaining the model quality. We introduce MOSEL, an automated inference serving system for multi-modal ML models that carefully picks input modalities per request based on user-defined performance and accuracy requirements. MOSEL exploits modality configurations extensively, improving system throughput by 3.6$\times$ with an accuracy guarantee and shortening job completion times by 11$\times$.


Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms

arXiv.org Artificial Intelligence

DNN inference can be accelerated by distributing the workload among a cluster of collaborative edge nodes. Heterogeneity among edge devices and accuracy-performance trade-offs of DNN models present a complex exploration space while catering to the inference performance requirements. In this work, we propose adaptive workload distribution for DNN inference, jointly considering node-level heterogeneity of edge devices, and application-specific accuracy and performance requirements. Our proposed approach combinatorially optimizes heterogeneity-aware workload partitioning and dynamic accuracy configuration of DNN models to ensure performance and accuracy guarantees. We tested our approach on an edge cluster of Odroid XU4, Raspberry Pi4, and Jetson Nano boards and achieved an average gain of 41.52% in performance and 5.2% in output accuracy as compared to state-of-the-art workload distribution strategies.


Developing Multi-Agent Systems with Degrees of Neuro-Symbolic Integration [A Position Paper]

arXiv.org Artificial Intelligence

In this short position paper we highlight our ongoing work on symbolic -- logical, transparent, explainable, verifiable, much verifiable heterogeneous multi-agent systems and, in particular, the slower, may be overwhelmed by data,... complex (and often non-functional) issues that impact the choice of structure within each agent. So, our aim is to capture, in the goal specification G, key aspects that need to be considered/achieved relating to this goal.


Accuracy-Guaranteed Collaborative DNN Inference in Industrial IoT via Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.